**Housekeeping
clear all
cap log close
log using ${rep_root}/logs/read_complaints_2021.log, text replace
set more off

**INCIDENTS
**Read raw indcident data
import delimited using ${rep_root}/data/FOIA/FOIA_complaints_21-060-216_04282021/clear_case_info.csv, bindquote(strict) maxquotedrows(1000)

**Rename record ID number to standardize with other files
rename log_no cr_id
drop if missing(cr_id)

**Read start, complaint, and closed date
gen inc_start_dt = date(incident_date, "MDY", 2025)
gen inc_complaint_dt = date(complaint_date, "DMY", 2025)
gen inc_closed_dt = date(closed_date, "DMY", 2025)

**Indicator for complaint brought by civilian
gen inc_civilian = complainant_type=="CIVILIAN"

**Indicator for complaint involves police shooting
gen inc_shooting = police_shooting == "Yes"

**Record police district and beat
destring beat_of_incident, force gen(inc_beat)
gen inc_district = floor(inc_beat/100)

**Unify non-chicago "districts" under 31 and set to missing if outside expected values
replace inc_district = 31 if inc_district==41
replace inc_district = . if !(inrange(inc_district, 1, 25) | inc_district==31)

**Mark if complaint category contains the word "domestic"
gen inc_any_domestic_i = strpos(category, "DOMESTIC")>0

**Standardize how addresses are written in the data
gen street_dir = ""
replace street_dir = "N" if street_direction=="North"
replace street_dir = "E" if street_direction=="East"
replace street_dir = "S" if street_direction=="South"
replace street_dir = "W" if street_direction=="West"

gen inc_address = string(street_no) + " " + street_dir + " " + street_name 
rename city inc_city
rename state inc_state
destring zip_cd, force gen(inc_zip)

**Keep only relevant variables and drop duplicate records with respect to these variables
keep cr_id inc_* investigating_agency

duplicates drop

**Some CR IDs appear more than once in the incident data because there is more than one incident
**associated with the complaint. We will organize these multiple incident reports wide to ensure the
**final dataset is unique by CR ID

**First sort by all the variables in the dataset, prioritizing cases that involve a shooting, then earlier cases
gsort cr_id inc_shooting inc_start_dt inc_complaint_dt inc_closed_dt inc_civilian inc_beat inc_address inc_city inc_state inc_zip
by cr_id: gen idx = _n
by cr_id: gen inc_tot = _N

**Record if any of the underlying incidents are domestic
by cr_id: egen inc_any_domestic = max(inc_any_domestic_i)
drop inc_any_domestic_i

**Keep only the first 4 incidents
drop if idx>4

**Reshape the data wide within a complaint
rename inc_* inc_*_
rename inc_tot_ inc_tot
rename inc_any_domestic_ inc_any_domestic
reshape wide inc_*_, i(cr_id) j(idx)

**Save incident data
save ${rep_root}/data/inc, replace
clear

**VICTIMS
**Read raw victim data
import delimited using ${rep_root}/data/FOIA/FOIA_complaints_21-060-216_04282021/clear_subject.csv, bindquote(strict)

**There are some pure duplicates in this dataset (every datum identical). No idea why
duplicates drop

**Rename record ID number to standardize with other files
rename log_no cr_id

**Record indicator that victim is male
gen vic_male_ = gender=="MALE"
replace vic_male_ = . if missing(gender)

**Record simple race category (black, white, other)
gen vic_race_ = .
replace vic_race_ = 1 if race=="WHITE"
replace vic_race_ = 2 if race=="BLACK"
replace vic_race_ = 3 if !missing(race) & missing(vic_race_)

**Record victim's birth year and require it be in a reasonable range
rename birth_year vic_byr_
replace vic_byr_ = . if !inrange(vic_byr_, 1900, 2020)

**Categorize injury codes
gen vic_inj_ = .
replace vic_inj_ = 0 if inlist(injury_condition, "NO VISIBLE INJURY, APPARENTLY NORMAL", "NO VISIBLE INJURY, UNDER INFLUENCE", "UNKNOWN")
replace vic_inj_ = 1 if inlist(injury_condition, "INJURED, NOT HOSPITALIZED", "INJURED, NOT HOSPITALIZED, UNDER INFLUENCE", "INJURED, REFUSED MEDICAL AID", "INJURED, REFUSED MEDICAL AID, UNDER INFLUENCE")
replace vic_inj_ = 2 if inlist(injury_condition, "INJURED, HOSPITALIZED", "INJURED, HOSPITALIZED, UNDER INFLUENCE")
replace vic_inj_ = 3 if inlist(injury_condition, "DECEASED", "DECEASED, UNDER INFLUENCE")

**Check for any female victims across all victims in the complaint
sort cr_id
gen vic_female_i = vic_male_==0 & !missing(vic_male_)
by cr_id: egen vic_female_any = max(vic_female_i)
drop vic_female_i

**Check for any white victims across all victims in the complaint
gen vic_white_i = vic_race_==1 & !missing(vic_race_)
by cr_id: egen vic_white_any = max(vic_white_i)
drop vic_white_i

**Record age of the oldest and youngest victim in the complaint
by cr_id: egen vic_byr_oldest = max(vic_byr_)
by cr_id: egen vic_byr_youngest = min(vic_byr_)

**Restrict to just cr_id created variables
keep cr_id vic_* 

**Complaints can have multiple victims. We will sort wide by victim within each 
**complaint

**Sort by victim information, prioritizing the victim who is most injured
gsort cr_id -vic_inj_ vic_byr_ vic_male_ vic_race_
by cr_id: gen idx = _n
by cr_id: gen vic_tot = _N

**Keeping only the first 4 victims
drop if idx>4

**Reshape wide within victim
reshape wide vic_byr_ vic_male_ vic_race_ vic_inj_, i(cr_id) j(idx)

**Label race and injury codes
label define race_lab 1 "White" 2 "Black" 3 "Other" 
label define inj_lab 0 "Not Injured" 1 "Minor Injury" 2 "Severe Injury" 3 "Deceased"
forvalues i = 1/4{
	label values vic_race_`i' race_lab
	label values vic_inj_`i' inj_lab
}

**Save victim data
save ${rep_root}/data/vic, replace
clear

**INVESTIAGORS
**Load raw invesitagor data
import delimited using ${rep_root}/data/FOIA/FOIA_complaints_21-060-216_04282021/clear_investigator.csv, bindquote(strict)

**There are some pure duplicates in this dataset (every datum identical). No idea why
duplicates drop

**Rename record ID number to standardize with other files
rename log_no cr_id

**Give each investigator (sorted by name) a unique ID
preserve
sort first_name last_name
by first_name last_name: keep if _n==1
gen inv_id = _n

keep first_name last_name inv_id
save ${rep_root}/data/inv_id, replace
restore

**Merge investigator IDs back to the main data
sort first_name last_name
merge m:1 first_name last_name using ${rep_root}/data/inv_id, nogen

**Record investigator unit
rename current_unit_assigned inv_unit

**Record whether investigator is marked as a supervisor on this case
gen inv_supervisor = 0
replace inv_supervisor = 1 if investigator_type=="SUPERVISING INVESTIGATOR"
replace inv_supervisor = . if missing(investigator_type)

**Record assignment date
gen inv_assigned_dt = date(assign_datetime, "DMY", 2025)

**Indicator for investigator is male
gen inv_male = gender=="M"
replace inv_male = . if missing(gender)

**Simple race category for investigator (white, black, other)
gen inv_race = .
replace inv_race = 1 if race=="WHI"
replace inv_race = 2 if race=="BLK"
replace inv_race = 3 if !missing(race) & missing(inv_race)

**Record investigator birthyear
rename birth_year inv_byr

**Record investigator names
rename first_name inv_first
rename last_name inv_last

**Keep just cr_id and created variables
keep cr_id inv_* 


**Most cases are assigned to more than one investigator. We will attempt to isolate
**the first supervising investigator here

**Sort by assignment date, then by whether investigator is a supervisor
gsort cr_id inv_assigned_dt -inv_supervisor inv_id

**Record when a case is transferring from a supervisor to a non-sueprvisor, 
**then prioritize the first such transitision
by cr_id: gen sup_to_nonsup = inv_supervisor==1 & inv_supervisor[_n+1]==0
gsort cr_id -sup_to_nonsup inv_assigned_dt inv_id
drop sup_to_nonsup

**Mark cases assigned to multiple supervisors in a single day
by cr_id: gen inv_simul_assign = inv_assigned_dt[_n]==inv_assigned_dt[_n+1] & inv_supervisor[_n]==1 & inv_supervisor[_n+1]==1 & !missing(inv_assigned_dt)

**Restrict to just the prioritized investigator, who is the first assigned supervisors
**who then gave the case to a non-supervisor that we observe
by cr_id: keep if _n==1

**Label race category
label define race_lab 1 "White" 2 "Black" 3 "Other" 
label values inv_race race_lab

**Save investigator data
save ${rep_root}/data/inv, replace
clear

**ACCUSED
**Load raw accused data
import delimited using ${rep_root}/data/FOIA/FOIA_complaints_21-060-216_04282021/clear_accused.csv, bindquote(strict) maxquotedrows(1000)

**There are some pure duplicates in this dataset (every datum identical). No idea why
duplicates drop

**Rename record ID number to standardize with other files
rename log_no cr_id
drop if missing(cr_id)

**Save original ordering of the dataset to break ties later
gen row_no = _n

**Categorize allegation codes as acc_cat
sort allegation_category_cd 
merge m:1 allegation_category_cd using ${rep_root}/data/acc_cat, nogen
rename acc_cat acc_cat_i

**Store accusation id number
rename accusation_id acc_id

**Record name and appointment date of accused officer
gen acc_fname = trim(first_name)
gen acc_lname = trim(last_name)
gen acc_mi = trim(middle_initial)
gen acc_appoint_dt = date(appointed_date, "DMY", 2025)

**Record indicator for white, indicator for male male, birthyear, position and race of accused officer
gen acc_white = race=="WHI"
replace acc_white = . if missing(race)

gen acc_male = gender=="M"
replace acc_male = . if missing(gender)

rename birth_year acc_byr

rename position_at_complaint acc_rank

rename race acc_race

replace no_of_days = 9999 if penalty_cd == "SEPARATION"
replace no_of_days = 0 if penalty_cd == "REPRIMAND" | penalty_cd == "VIOLATION NOTED"


**Record if the last finding in the history of a given allegation is Sustained
**or (sustained + not sustained)
**mark if an officer has any allegations so sustained and the maximum of the
**number of days associated with any sustained allegation
sort cr_id acc_id allegation_category_cd row_no
by cr_id acc_id allegation_category_cd: gen acc_sustained_i = finding_cd=="SUSTAINED" & _n==_N
by cr_id acc_id allegation_category_cd: gen acc_investigated_i = inlist(finding_cd, "SUSTAINED", "NOT SUSTAINED") & _n==_N
by cr_id acc_id allegation_category_cd: gen acc_susp_days_i = no_of_days if acc_sustained_i==1
by cr_id acc_id: egen acc_sustained = max(acc_sustained_i)
by cr_id acc_id: egen acc_investigated = max(acc_investigated_i)
by cr_id acc_id: egen acc_susp_days = max(acc_susp_days_i)
by cr_id acc_id: egen acc_cat = max(acc_cat_i)
drop acc_sustained_i acc_susp_days_i

**Record an alternative outcome (deprecated)
*gen acc_investigated = inlist(finding_cd, "NOT SUSTAINED", "EXONERATED", "ADDITIONAL INVESTIGATION REQUESTED")

**Record indicators for any officer information missing
gen miss_first = missing(acc_fname)
gen miss_last = missing(acc_lname)
gen miss_mi = missing(acc_mi)
gen miss_appoint = missing(acc_appoint_dt)

**Keep the record from each accusation with the minimum amount of officer-identifying
**information missing
sort cr_id acc_id miss_last miss_first miss_appoint miss_mi
by cr_id acc_id: keep if _n==1

**Keep cr_id and created variables
keep cr_id acc_* finding_cd

**Save accused data (with potentially multiple observations per complaint 
**to include every accused officer)
save ${rep_root}/data/acc, replace

**Merge on investigator data, keeping only complaints for which we have investigator information
sort cr_id
merge m:1 cr_id using ${rep_root}/data/inv, keep(2 3) gen(inv_merge)

**Merge on victim and incident information
sort cr_id
merge m:1 cr_id using ${rep_root}/data/vic, keep(1 3) gen(vic_merge)

sort cr_id
merge m:1 cr_id using ${rep_root}/data/inc, keep(1 3) gen(inc_merge)

**Create variables that indicate whether we have each kind of data
gen acc_data = inv_merge==3
gen vic_data = vic_merge==3
gen inc_data = inc_merge==3
drop inv_merge vic_merge inc_merge

**Create age of officer at accusation (only possible after merging incident and accused data)
gen acc_age = year(inc_complaint_dt_1) - acc_byr

save ${rep_root}/data/complaints_2021, replace
clear
log close
